Is Error-Based Pruning Redeemable?

نویسندگان

  • Lawrence O. Hall
  • Kevin W. Bowyer
  • Robert E. Banfield
  • Steven Eschrich
  • Richard Collins
چکیده

Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision tree software. It uses a parameter, the certainty factor, that affects the size of the pruned tree. Several researchers have compared error based pruning with other approaches, and have shown results that suggest that error based pruning results in larger trees that give no increase in accuracy. They further suggest that as more data is added to the training set, the tree size after applying error based pruning continues to grow even though there is no increase in accuracy. It appears that these results were obtained with the default certainty factor value. Here, we show that varying the certainty factor allows significantly smaller trees to be obtained with minimal or no accuracy loss. Also, the growth of tree size with added data can be halted with an appropriate choice of certainty factor. Methods of determining the certainty factor are discussed for both small and large data sets. Experimental results support the conclusion that error based pruning can be used to produce appropriately sized trees with good accuracy when compared with reduced error pruning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Appears in Ecml-98 as a Research Note a Longer Version Is Available as Ece Tr 98-3, Purdue University Pruning Decision Trees with Misclassiication Costs 1 Pruning Decision Trees

We describe an experimental study of pruning methods for decision tree classiiers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical com...

متن کامل

Appears in Ecml-98 as a Research Note Pruning Decision Trees with Misclassiication Costs 1 Pruning Decision Trees

We describe an experimental study of pruning methods for decision tree classi ers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical com...

متن کامل

An Empirical Comparison of Pruning Methods for Ensemble Classifiers

Many researchers have shown that ensemble methods such as Boosting and Bagging improve the accuracy of classification. Boosting and Bagging perform well with unstable learning algorithms such as neural networks or decision trees. Pruning decision tree classifiers is intended to make trees simpler and more comprehensible and avoid over-fitting. However it is known that pruning individual classif...

متن کامل

Error-Based Pruning of Decision Trees Grown on Very Large Data Sets Can Work!

It has been asserted that, using traditional pruning methods, growing decision trees with increasingly larger amounts of training data will result in larger tree sizes even when accuracy does not increase. With regard to error-based pruning, the experimental data used to illustrate this assertion have apparently been obtained using the default setting for pruning strength; in particular, using ...

متن کامل

Pruning Decision Trees with Misclassi cation Costs 08 - FEB - 1998

We describe an experimental study of pruning methods for decision tree classi ers in two learning situations: minimizing loss and probability estimation. In addition to the two most common methods for error minimization, CART's cost-complexity pruning and C4.5's errorbased pruning, we study the extension of cost-complexity pruning to loss and two pruning variants based on Laplace corrections. W...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • International Journal on Artificial Intelligence Tools

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2003